5,120 research outputs found

    MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

    Full text link
    Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .Comment: to appear at AAAI 201

    Revisiting the problem of audio-based hit song prediction using convolutional neural networks

    Full text link
    Being able to predict whether a song can be a hit has impor- tant applications in the music industry. Although it is true that the popularity of a song can be greatly affected by exter- nal factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research question on its own. Motivated by the recent success of deep learning techniques, we attempt to ex- tend previous work on hit song prediction by jointly learning the audio features and prediction models using deep learning. Specifically, we experiment with a convolutional neural net- work model that takes the primitive mel-spectrogram as the input for feature learning, a more advanced JYnet model that uses an external song dataset for supervised pre-training and auto-tagging, and the combination of these two models. We also consider the inception model to characterize audio infor- mation in different scales. Our experiments suggest that deep structures are indeed more accurate than shallow structures in predicting the popularity of either Chinese or Western Pop songs in Taiwan. We also use the tags predicted by JYnet to gain insights into the result of different models.Comment: To appear in the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    Self-Supervised Learning for Speech Enhancement through Synthesis

    Full text link
    Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative speech synthesis, where the system's output is synthesized by a neural vocoder after an inherently lossy feature-denoising step. In this paper, we propose a denoising vocoder (DeVo) approach, where a vocoder accepts noisy representations and learns to directly synthesize clean speech. We leverage rich representations from self-supervised learning (SSL) speech models to discover relevant features. We conduct a candidate search across 15 potential SSL front-ends and subsequently train our vocoder adversarially with the best SSL configuration. Additionally, we demonstrate a causal version capable of running on streaming audio with 10ms latency and minimal performance degradation. Finally, we conduct both objective evaluations and subjective listening studies to show our system improves objective metrics and outperforms an existing state-of-the-art SE model subjectively

    CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

    Full text link
    Speech quality assessment has been a critical component in many voice communication related applications such as telephony and online conferencing. Traditional intrusive speech quality assessment requires the clean reference of the degraded utterance to provide an accurate quality measurement. This requirement limits the usability of these methods in real-world scenarios. On the other hand, non-intrusive subjective measurement is the ``golden standard" in evaluating speech quality as human listeners can intrinsically evaluate the quality of any degraded speech with ease. In this paper, we propose a novel end-to-end model structure called Convolutional Context-Aware Transformer (CCAT) network to predict the mean opinion score (MOS) of human raters. We evaluate our model on three MOS-annotated datasets spanning multiple languages and distortion types and submit our results to the ConferencingSpeech 2022 Challenge. Our experiments show that CCAT provides promising MOS predictions compared to current state-of-art non-intrusive speech assessment models with average Pearson correlation coefficient (PCC) increasing from 0.530 to 0.697 and average RMSE decreasing from 0.768 to 0.570 compared to the baseline model on the challenge evaluation test set

    Inscuteable and Staufen Mediate Asymmetric Localization and Segregation of prosperoRNA during Drosophila Neuroblast Cell Divisions

    Get PDF
    AbstractWhen neuroblasts divide, inscuteable acts to coordinate protein localization and mitotic spindle orientation, ensuring that asymmetrically localized determinants like Prospero partition into one progeny. staufen encodes a dsRNA-binding protein implicated in mRNA transport in oocytes. We demonstrate that prospero RNA is also asymmetrically localized and partitioned during neuroblast cell divisions, a process requiring both inscuteable and staufen. Inscuteable and Staufen interact and colocalize with prospero RNA on the apical cortex of interphase neuroblasts. Staufen binds prospero RNA in its 3′UTR. Our findings suggest that Inscuteable nucleates an apical complex and is required for protein localization, spindle orientation, and RNA localization. Stau, as one component of this complex, is required only for RNA localization. Hence staufen also acts zygotically, downstream of inscuteable, to effect aspects of neuroblast asymmetry

    BIOMECHANICAL ANALYSIS DURING COUNTERMOVEMENT JUMP IN CHILDREN AND ADULTS

    Get PDF
    This study was to examine the biomechanical characteristics of children and adults during countermovement jump. Seven children and seven adult males were recruited to the study. A Peak high-speed camera (120Hz) synchronized with a force plate (600Hz) were used to record vertical jumping action. The kinetic parameters were calculated by using inverse dynamic method. Results showed that the children had both immature joint function prior to propulsion and inadequate knee and ankle joints function during propulsion. It is concluded that a lack of form in jumping strategy was performed during vertical jumpings in the children's group in terms of the kinetic methods was performed. This information may be used in following studies about countermovement jump, avoiding some important information needed only by kinematic analysis, it will be more complete to apply kinetic analysis for children movement researches

    Doping and temperature dependence of electron spectrum and quasiparticle dispersion in doped bilayer cuprates

    Get PDF
    Within the t-t'-J model, the electron spectrum and quasiparticle dispersion in doped bilayer cuprates in the normal state are discussed by considering the bilayer interaction. It is shown that the bilayer interaction splits the electron spectrum of doped bilayer cuprates into the bonding and antibonding components around the (π,0)(\pi,0) point. The differentiation between the bonding and antibonding components is essential, which leads to two main flat bands around the (π,0)(\pi,0) point below the Fermi energy. In analogy to the doped single layer cuprates, the lowest energy states in doped bilayer cuprates are located at the (π/2,π/2)(\pi/2,\pi/2) point. Our results also show that the striking behavior of the electronic structure in doped bilayer cuprates is intriguingly related to the bilayer interaction together with strong coupling between the electron quasiparticles and collective magnetic excitations.Comment: 9 pages, 4 figures, updated references, added figures and discussions, accepted for publication in Phys. Rev.
    corecore